The three rings to rule the world in R and spatial analysis are sf, tigris, and ggplot or leaflet. It is even easier in the U.S. if you also have the tidycensus package. In this exercise we will analyse a bird flight path and the health of the bird’s forest home by using these super powers. Our steps will be as follows:
Say you want to measure how far a bird must fly from their nest
in St. Paul, MN down to the center of Smoky Mountain National Park in
Tennessee. Let’s set up 2 points, and then measure how far apart they
are. Next we will develop a polygon that is a straight path between
these two nesting locations and join with the states to develop a final
list of states that the birds will fly through and make some maps. We
will of course follow up this exercise by setting up strict
conservation laws to protect these flight paths.
A tree in the center of St. Paul.
We set up the simplest point data, 1 point using their latitude and longitude. We use the st_point function to tell R that these are coordinates. The st_sfc function creates a simple feature geometry list. Then we finally set up point as a simple feature using the st_as_sf() function and include the coordinate reference system.
library(sf)
library(tidyverse)
lat_y <- 44.9283966 #Always remember that latitude is a ladder.
long_x <- -93.1734448
coords <- c(long_x, lat_y)
nest_mn <- st_sfc(st_point(coords))
nest_mn <- st_as_sf(nest_mn, crs = 4326)
A tree in the center of Smoky Mountain National Park.
Now we make another point in Tenneessee. Remember points are singular geometries and have no area. Sometimes, if you are attempting to intersect points with polygons, you may have to draw a buffer around them.
lat_y <- 35.5811962
long_x <- -83.8609472
coords <- c(long_x, lat_y) # making a list of coordinates, still just a list of numbers to R
nest_smnp <- st_sfc(st_point(coords)) # setting up the geometry list datum
nest_smnp <- st_as_sf(nest_smnp, crs = 4326) # providing the coordinate reference system
How far apart are these two nests? We need to convert the coordinate reference systems to make sure that the two locations are comparable and our final answer is in meters. You can change coordinate reference systems by using the function st_transform. Next we use the st_distance. Since we are using a coordinate reference system with units in meters, our distance units will be meters as well. So, we can convert to kilometers by dividing by 1000.
nest_mn_meters <- st_transform(nest_mn, crs = 9822)
nest_smnp_meters <- st_transform(nest_smnp, crs = 9822)
tree_distances <- st_distance(nest_mn_meters, nest_smnp_meters)
tree_distances <- tree_distances/1000
Click here to open and download
We will make a data frame with our two polygons by binding the
two points together as rows of a table using the rbind
function. Then, let’s say that the birds fly in roughly a straight
line. So, let’s cast these two points into a line between them by first
joining them using the
st_union function and then casting them to a “linestring” (this is a
line geometry) using the
st_cast function.
tree_points <- rbind(nest_mn, nest_smnp)
tree_route <- tree_points %>%
summarize(geometry = st_union(.)) %>%
st_cast("LINESTRING") %>%
mutate(route = "Minnesota to Smoky Mountain National Park")
plot(tree_route$geometry)
We can be fairly sure the birds won’t fly in an exactly straight line from Minnesota to Tennessee. So, let’s put a buffer around this line and make it a polygon. Let’s assume the birds will fly within 1000 meters on either side of our line. So, REMEMBER about units? Let’s check what our units currently are (hint latitude and longitude have what units?), and then change the units and draw our new polygon. It will be glorious.
st_crs(tree_route)
## Coordinate Reference System:
## User input: EPSG:4326
## wkt:
## GEOGCRS["WGS 84",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)"],
## MEMBER["World Geodetic System 1984 (G730)"],
## MEMBER["World Geodetic System 1984 (G873)"],
## MEMBER["World Geodetic System 1984 (G1150)"],
## MEMBER["World Geodetic System 1984 (G1674)"],
## MEMBER["World Geodetic System 1984 (G1762)"],
## MEMBER["World Geodetic System 1984 (G2139)"],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[2.0]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Horizontal component of 3D system."],
## AREA["World."],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
tree_route <- st_transform(tree_route, crs = 9822)
st_crs(tree_route)
## Coordinate Reference System:
## User input: EPSG:9822
## wkt:
## PROJCRS["RGF93 v2 / CC42",
## BASEGEOGCRS["RGF93 v2",
## DATUM["Reseau Geodesique Francais 1993 v2",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",9777]],
## CONVERSION["France Conic Conformal zone 1",
## METHOD["Lambert Conic Conformal (2SP)",
## ID["EPSG",9802]],
## PARAMETER["Latitude of false origin",42,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8821]],
## PARAMETER["Longitude of false origin",3,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8822]],
## PARAMETER["Latitude of 1st standard parallel",41.25,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8823]],
## PARAMETER["Latitude of 2nd standard parallel",42.75,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8824]],
## PARAMETER["Easting at false origin",1700000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8826]],
## PARAMETER["Northing at false origin",1200000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8827]]],
## CS[Cartesian,2],
## AXIS["easting (X)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["northing (Y)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["Cadastre, engineering survey."],
## AREA["France onshore - mainland south of 43°N and Corsica."],
## BBOX[41.31,-1.06,43.07,9.63]],
## ID["EPSG",9822]]
tree_route <- st_buffer(tree_route, dist = 1000)
tree_route <- st_transform(tree_route, crs = 4326)
plot(tree_route)
Looking at that shape in our Plots window is satisfying, but doesn’t tell us anything about what states the polygon goes through, and we’d like a pretty map in the end. So, let’s get started. First, we are going to pull in our tigris package and grab the U.S. states’ polygons.
library(tigris)
all_states <- states()
Now let’s see which states intersect with our polygon bird path. We bring back some sf magic for this. But remember, CRS must be the same!! SO, we will need to do some transformation. This time we use st_crs to transform one geometry to the CRS of the other one. Then we can use st_filter to filter one geometry to the extent of the other.
st_crs(all_states)
## Coordinate Reference System:
## User input: NAD83
## wkt:
## GEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]]
st_crs(tree_route)
## Coordinate Reference System:
## User input: EPSG:4326
## wkt:
## GEOGCRS["WGS 84",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)"],
## MEMBER["World Geodetic System 1984 (G730)"],
## MEMBER["World Geodetic System 1984 (G873)"],
## MEMBER["World Geodetic System 1984 (G1150)"],
## MEMBER["World Geodetic System 1984 (G1674)"],
## MEMBER["World Geodetic System 1984 (G1762)"],
## MEMBER["World Geodetic System 1984 (G2139)"],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[2.0]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Horizontal component of 3D system."],
## AREA["World."],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
tree_route <- st_transform(tree_route, crs = st_crs(all_states))
bird_route_states <- st_filter(all_states, tree_route)
Now we have two types of polygons, a set of states that intersect with our bird path and the bird flight path itself. We will make a map of these two shapes in two ways. First with leaflet then with ggplot2.
library(leaflet)
leaflet() %>%
addTiles() %>%
addPolygons(data = bird_route_states) %>%
addPolygons(data = tree_route, weight = 8, fillColor = 'darkorange', color = 'darkorange')
Now, ggplot2
ggplot() +
geom_sf(data = bird_route_states, aes(fill = NAME)) +
geom_sf_text(data = bird_route_states, aes(label = NAME)) +
geom_sf(data = tree_route, aes(fill = route), size = 10) +
theme(legend.position = "none")
Now let’s protect some birds! Let’s check out the health of the forest by county in the state of Minnesota. We will need to know the locations of the counties in Minnesota, and we will also need some forest health data.
The Minnesota Geospatial Commons is an excellent source of geospatial data in Minnesota and very searchable.
If you search forest health you will find some forest health datasets of large scale tree canopy damage prepared for several years. We will use the most recent, 2021. Let’s read in a shapefile using the sf package. You will note that the shapefile is zipped. So we first create two temporary folders to begin the process of grabbing this file, next unzipping it, and then reading it in as a simple feature (sf object). The temp and temp2 folders, and the shapefile, is saved in a local temporary folder. If you want to save the shapefile on your computer for the longterm, you can use the st_write() function in the sf package.
# Create temp files
temp <- tempfile()
temp2 <- tempfile()
# Download the zip file and save to 'temp'
URL <- "https://resources.gisdata.mn.gov/pub/gdrs/data/pub/us_mn_state_dnr/env_forest_health_survey_2021/shp_env_forest_health_survey_2021.zip"
download.file(URL, temp)
# Unzip the contents of the temp and save unzipped content in 'temp2'
unzip(zipfile = temp, exdir = temp2)
# Read the shapefile. Alternatively make an assignment, such as f<-sf::read_sf(your_SHP_file)
forest_damage <- sf::st_read(temp2)
## Reading layer `forest_health_survey_2021' from data source
## `C:\Users\kellickson\AppData\Local\Temp\RtmpwJyWro\file3e40512ca36'
## using driver `ESRI Shapefile'
## Simple feature collection with 2106 features and 10 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 270884.5 ymin: 4825811 xmax: 684315 ymax: 5467517
## Projected CRS: NAD83 / UTM zone 15N
We can look at the data in two ways, view the attributes by clicking forest_damage in the Environment in R Studio. Go ahead, and do that now. Look at the geometry column, that is where the spatial information is saved and it is where R looks for it. Each row in this dataset is a different polygon, but are all a part of the same dataset or shapefile.
Next, we can plot the data simply in base R to make sure it has the appropriate geographic information and plots as a map. This is generally a good check to make sure you’ve succesfully pulled in a proper shapefile.
plot(forest_damage[1])
When we look at this, we see that there are many polygons, and if we look at the data (by typing view(forest_damage), or clicking on forest_damage in the environment) we can see that there are severity ratings for each polygon. Let’s do a very simple analysis and sum the area in each county that has damage to tree canopy. Then, we can join to a county geospatial dataset to look at the places in MN with the most treecanopy damage and see if our bird was affected. First let’s use the power of tigris and pull in the county shapefile for Minnesota.
counties <- counties(state = "MN")
And, of course we want to check to see if our CRS match, because we want to join these two geospatial datasets.
st_crs(counties)
## Coordinate Reference System:
## User input: NAD83
## wkt:
## GEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]]
st_crs(forest_damage)
## Coordinate Reference System:
## User input: NAD83 / UTM zone 15N
## wkt:
## PROJCRS["NAD83 / UTM zone 15N",
## BASEGEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]],
## CONVERSION["UTM zone 15N",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-93,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## ID["EPSG",26915]]
The forest damage dataset has a coordinate reference system in UTMs (Universal Transverse Mercator), and remember that those have units of meters, which are very nice for working with area measurements. So, let’s convert the county dataset into UTMs, zone 15 (Minnesota’s main UTM zone).
counties <- st_transform(counties, crs = 26915)
Now, we can run a spatial intersect on the two geospatial datasets. The st_intersection function returns the polygons where the two geometries intersect, here’s a short description with images. All of the polygons in the forest_damage dataset will be split into their respective Minnesota county. So, some of the forest damage polygons may be split across several counties, this will add to our work. But no worries, R has us covered!
forest_damage_counties <- st_intersection(counties, forest_damage)
Next, we can calculate the area for each of the
forest_damage polygons within each joined county. Some of our
polygons might look a bit like this,
forest_damage_counties <- forest_damage_counties %>%
mutate(damage_area = st_area(geometry))
Now, we are going to sum the areas with tree canopy tree data within each county, returning the following information: the total area in the polygon and the name of the County. We no longer need each individual polygon geometry, so we set the geometry of the dataset to NULL.
st_geometry(forest_damage_counties) <- NULL
forest_damage_counties <- forest_damage_counties %>%
group_by(NAME) %>%
summarise(total_damaged_area = sum(damage_area, na.rm = T)) %>%
ungroup() %>%
mutate(total_damaged_area = as.numeric(total_damaged_area))
We need to join the summed forest damage data to the county spatial data so that we have the information we need to present the data as a map.
counties_forest_damage <- left_join(counties, forest_damage_counties, by = "NAME")
Now, we replace all missing values with zeroes. You always want to think a bit before replacing NULLs with zeroes. In this case, we have some information to justify that the missing values are where there was no forest canopy damage and so the forest canopy damage area would be zero (based on the reports used in this dataset). Sometimes a NULL means there is no information, which means that you don’t know what the value is and you do not have adequate information for replacements.
counties_forest_damage <- counties_forest_damage %>%
mutate(total_damaged_area = replace_na(total_damaged_area, 0))
Finally we get to make a map. Remember, data prep is more than half the battle!
ggplot(data = counties_forest_damage, aes(fill = total_damaged_area)) +
geom_sf()
Our bird originated in St. Paul, MN so it looks like there wasn’t a lot of damaged forest canopy where our bird lived. That’s good news!
Happy spatial analysis friends!